ci: R2R-compile staged DLLs (crossgen2) before nupkg pack#1
Merged
Conversation
Stock dotnet/wpf DLLs in Microsoft.WindowsDesktop.App ship with ReadyToRun native code, baked in by dotnet/runtime's runtime-pack assembly step. Our fork builds the libraries via build.cmd but does not run that step, so the DLLs we ship in the InitialForce.WPF nupkg are JIT-only. This caused stack overflows in consumers: JIT'd frames are slightly fatter than R2R'd frames, and WPF code paths that are already deep on the stack (dispatcher unhandled-exception handler -> MessageDialog.xaml -> BAML -> WPFLocalizeExtension's 800-culture iteration) overflow the 1 MB thread stack. Add a workflow step that downloads the upstream Microsoft.NETCore.App.Crossgen2.win-x64 NuGet package (cached under .tools-cache/) and runs crossgen2 over the 4 staged DLLs in both packaging trees (InitialForce.WPF and InitialForce.WPF.RuntimeOverride). Each output is verified to contain the R2R magic before replacing the input. --targetarch matches the matrix.arch so we get win-arm64 R2R images for the arm64 build. Verified locally: crossgen2 10.0.7 successfully R2R-compiles all 4 patched DLLs (PresentationCore, PresentationFramework, WindowsBase, System.Xaml). Output sizes grow ~0-145% (varies by symbol density), all contain the RTR magic at the expected offset, and consuming the R2R'd DLLs eliminates the deep-stack SO that the previous nupkgs exhibited. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 30, 2026
oysteinkrog
pushed a commit
that referenced
this pull request
May 12, 2026
Each call to UIElement.InputHitTest(Point, out, out, out) allocated four small heap objects: PointHitTestParameters, InputHitTestResult, and the two callback delegates (filter + result). At ~60 Hz cursor movement across a moderately deep visual tree, this fires ~5-50k times per scenario. The 2026-05-11 deep-dive (autoresearch/deep-dive-2026-05-11/T1-point-allocations.md) flagged this as the #1 contributor to the ~71 MB combined System.Windows.Point allocation budget across take-open + playback — estimated savings 30-40 MB. Three changes: - The filter callback's body uses only the `currentNode` argument and static UIElementHelper helpers — no `this` capture. Make it `private static` and cache one shared HitTestFilterCallback delegate as a static readonly field. - Cache a single PointHitTestParameters wrapper per thread via [ThreadStatic]. PointHitTestParameters.SetHitPoint() (already internal) mutates the inner Point before each VisualTreeHelper.HitTest call. - Add Acquire/Release pooling to the nested InputHitTestResult class. The HitTestResultCallback's delegate target IS the instance, so the pool stores the (instance, callback) pair to preserve binding across cycles. On rare nested reentrancy, Acquire falls back to a fresh instance — same single-slot pattern as the existing StreamGeometryCallbackContext pool. Result and HitTestResult are captured into locals BEFORE Release so the post-traversal iteration uses only stable values. VisualTreeHelper.HitTest is synchronous and consumes the parameters during traversal (no retention past return). The callbacks (filter + result) don't reinvoke InputHitTest, so reentrancy within one traversal is impossible. Reentrancy from the post-traversal contentHost.InputHitTest chain happens AFTER Release — pool slot is repopulated by the time recursion would run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
oysteinkrog
pushed a commit
that referenced
this pull request
May 16, 2026
Commit 7831813 ("wpf-perf(big-win T4): pool AdornerLayer._zOrderMap value snapshot") shipped a per-instance object[] snapshot buffer shared between MeasureOverride and ArrangeOverride to eliminate ~170 MB of per-pass DictionaryEntry[] allocations during MotionCatalyst take-open. Defect: Adorner.Measure / Adorner.Arrange callouts can re-enter the same AdornerLayer's MeasureOverride/ArrangeOverride via a nested layout pass. A naïve shared field lets the inner call's CopyTo overwrite the outer pass's snapshot, and its terminal Array.Clear nulls the slots the outer is still iterating — the outer then reads a null reference and the layout throws, leaving MotionCatalyst with a completely blank canvas on take-open. Fix: lease pattern. Each call captures the current field value into a local, immediately nulls the field (so any re-entrant call allocates its own buffer rather than aliasing), iterates on the local, and at end of pass restores its buffer to the field — keeping whichever buffer (own or the one a nested call left behind) is larger. Steady state on the non-re-entrant path remains zero-allocation: the field holds the grown buffer, every subsequent call leases-clears- copies-iterates-clears-restores in place. Re-entrant calls pay one object[] allocation per nesting level, matching the worst case of the pre-7831813a baseline. Validated end-to-end via MCP UI screenshots on MotionCatalyst: - HEAD before fix: take-open shows fully black canvas - HEAD + this fix: identical to vanilla upstream/release/10.0 (Carl Hansen golf swing, Frame 0/1240, both video viewports rendered, Pressure & Stance heatmap, Launch Monitor, all data boxes populated, playback toggles cleanly) All 358 perf commits in PRs #1-#4 preserved.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a crossgen2 (ReadyToRun) compilation step to the nupkg build pipeline so the published `InitialForce.WPF` and `InitialForce.WPF.RuntimeOverride` nupkgs contain native code, matching stock `Microsoft.WindowsDesktop.App` behavior.
Why
Stock dotnet/wpf DLLs ship with R2R native code baked in by dotnet/runtime's runtime-pack assembly step. Our fork builds the libraries via `build.cmd` but does not run that step, so the DLLs in our nupkgs were JIT-only.
JIT'd frames are slightly fatter than R2R'd frames. WPF code paths that are already deep on the stack — notably the dispatcher unhandled-exception handler loading `MessageDialog.xaml` → BAML callbacks → WPFLocalizeExtension's 800-culture iteration — overflowed the 1 MB thread stack in consumers, taking the process down instead of merely showing the user an error.
What changed
Verified locally
Companion fix
There is also a master-side defense-in-depth fix in InitialForce/ScDesktop#6790 that defers error-dialog construction off the deep dispatcher stack. Either fix alone resolves the SO; together they belt-and-suspenders the issue for any consumer of our WPF nupkgs.
Test plan
🤖 Generated with Claude Code